2025-01-20 17:54:54.AIbase.14.9k
Breakthrough in Large Models: Extracting High-Quality Multimodal Textbooks from Educational Videos
Recently, Zhejiang University and Alibaba DAMO Academy jointly released a remarkable study aimed at creating high-quality multimodal textbooks from educational videos. This innovative research not only provides new ideas for training large-scale language models (VLMs) but may also change the way educational resources are utilized. With the rapid development of artificial intelligence technology, the pre-training corpus of VLMs mainly relies on visual-text pairs and visually intertwined data. However, much of this current data comes from the web, where the correlation between text and images is weak, and the knowledge density is relatively low.